-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new metric type: Gauge
+ CurrentMemoryUsage
to metrics
#1682
Conversation
impl AggregatedMetricsSet { | ||
/// Create a new aggregated set | ||
pub(crate) fn new() -> Self { | ||
pub fn new() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To create a memory-managed version of ShuffleWriter
, we need to expose these four APIs for dependent crate usage.
I'm not sure this should be in its own PR, or I can have it here since this PR is also metric-related. I can remove this if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to be in this PR
datafusion/src/physical_plan/mod.rs
Outdated
/// Returns the current memory usage for this stream. | ||
fn mem_used(&self) -> usize { | ||
0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line adds a mem_used
method in our essential RecordBatchStream
trait.
A baby step to tracking Non-Limited-Operators' memory usage since I think SendableRecordBatchStream
is the fundamental entity that holds memory during execution. However, I didn't quite find a way to register these streams generated during async execute
to our memory manager.
I would love to hear your thoughts.
If considered not appropriate, I will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After re-consider this for a while. I cannot think of a solution to register RecordBatchStream
somewhere else since each stream is used through mutable reference. I don't think there's a way sharing it except for Arc<dyn RecordBatchStream ....>, which we have discussed earlier and is not acceptable.
I'm going to revert this last commit.
Gauge
Gauge
+ CurrentMemoryUsage
to metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thanks @yjshen
impl AggregatedMetricsSet { | ||
/// Create a new aggregated set | ||
pub(crate) fn new() -> Self { | ||
pub fn new() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to be in this PR
use std::sync::Arc; | ||
use tempfile::NamedTempFile; | ||
use tokio::sync::mpsc::{Receiver as TKReceiver, Sender as TKSender}; | ||
use tokio::sync::mpsc::{Receiver, Sender}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
And thank you for the review @liukun4515 |
Which issue does this PR close?
Closes #.
Rationale for this change
A gauge metric is useful for reporting purposes by returning a value each time, and a gauge suits well for memory consumption reports.
What changes are included in this PR?
Gauge
CurrentMemoryUsage
toBaselineMetrics
gauge
in sort for memory tracking.Are there any user-facing changes?
No.